Software Tools for Earth and Environmental Science

– 12th WEEK –

Emir Toker

06/12/2019

R Probability

  • Syllabus and Book
  • DataCamp Class
  • R - Elemantary Statistics
  • Basic Data Visualization
  • R - Probability

Coffee Break

  • R - Common Probability Distributions
  • Practice : Write A Function

QUIZ

Next Week

Syllabus and Book

Syllabus

Book

DataCamp Class

DataCamp Class

LINK

R - Elemantary Statistics

R - Elemantary Statistics

R - Elemantary Statistics

Centrality: Mean, Median, Mode

  • Measures of centrality are commonly used to explain large collections of data by describing where numeric observations are centered.

R - Elemantary Statistics

Centrality: Mean, Median, Mode

  • Median : “middle magnitude” of your observations

0 . 0 . 0 . 0 . 0

o . o . o . o . o . o

R - Elemantary Statistics

Centrality: Mean, Median, Mode

  • Mode : Simply the “most common” observation.

Sample : 2 , 4.4 , 3 , 3 , 2 , 2.2 , 2 , 4

2 , 2 , 2 , 2.2 , 3 , 3 , 4, 4.4 ( n=8 , n/2 = 4)

R - Elemantary Statistics

Centrality: Mean, Median, Mode

xdata <- c(2,4.4,3,3,2,2.2,2,4)

  • mean(xdata)
  • median(xdata)
  • min(xdata)
  • max(xdata)
  • range(xdata)

R - Elemantary Statistics

Quantiles, Percentiles, and the Five-Number Summary

  • A quantile is a value indicates an observation rank when compared to all the other present observations.
  • For example, the median is itself a quantile. It’s the 0.5th quantile.
  • Alternatively, quantiles can be expressed as a percentile.

The median = the 0.5th quantile = The 50th percentile

Sample : 2 , 4.4 , 3 , 3 , 2 , 2.2 , 2 , 4

2 , 2 , 2 , 2.2 , 3 , 3 , 4, 4.4

0.5th quantile = median = 2.6

R - Elemantary Statistics

Quantiles, Percentiles, and the Five-Number Summary

## 80% 
## 3.6
##   0%  25%  50%  75% 100% 
## 2.00 2.00 2.60 3.25 4.40
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   2.000   2.600   2.825   3.250   4.400

R - Elemantary Statistics

Quantiles, Percentiles, and the Five-Number Summary

A quartile is a type of quantile.

R - Elemantary Statistics

Quantiles , Percentiles, and the Five-Number Summary

xdata <- c(2,4.4,3,3,2,2.2,2,4)

  • quantile(xdata)
  • summary(xdata)

R - Elemantary Statistics

Spread: Variance, Standard Deviation, and the Interquartile Range

  • How dispersed your data are. For this, measures of spread are needed.
## [1] 2.825
## [1] 2.825

R - Elemantary Statistics

the observations in ydata are more “spread out”

R - Elemantary Statistics

Spread: Variance, Standard Deviation, and the Interquartile Range

  • The sample variance measures the degree of the spread of numeric observations around their arithmetic mean.

R - Elemantary Statistics

Spread: Variance, Standard Deviation, and the Interquartile Range

2 , 4.4 , 3 , 3 , 2 , 2.2 , 2 , 4 ( mean = 2.825)

R - Elemantary Statistics

Spread: Variance, Standard Deviation, and the Interquartile Range

  • The standard deviation is simply the square root of the variance. The scale of the original observations.

0.953 represents the average distance of each observation from the mean

R - Elemantary Statistics

Spread: Variance, Standard Deviation, and the Interquartile Range,

  • Unlike the variance and standard deviation, the interquartile range (IQR) is not computed with respect to the sample mean.

  • IQR is computed as the difference between the upper and lower quartiles of your data

R - Elemantary Statistics

Spread: Variance, Standard Deviation, and the Interquartile Range

## [1] 0.9078571
## [1] 0.9528154
## [1] 1.25

R - Elemantary Statistics

R - Elemantary Statistics

R - Elemantary Statistics

Spread: Variance, Standard Deviation, and the Interquartile Range

xdata <- c(2,4.4,3,3,2,2.2,2,4)

  • var()
  • sd()
  • IQR()

R - Elemantary Statistics

Covariance and Correlation

  • Investigate the relationship between two numeric variables to assess trends
  • The covariance expresses how much two numeric variables “change together” and the nature of that relationship, whether it is positive or negative.

R - Elemantary Statistics

Covariance and Correlation

x = {x1,x2,…,xn}

y = {y1,y2,…,yn}

for i = 1,. . . ,n

When you get a positive result for rxy, it shows that there is a positive lin- ear relationship. When rxy = 0, this indicates that there is no linear relationship.

R - Elemantary Statistics

Covariance and Correlation

x = {2,4.4,3,3,2,2.2,2,4}

y = {1,4.4,1,3,2,2.2,2,7}

mean x and y = 2.825

positive relationship

R - Elemantary Statistics

Covariance and Correlation

R - Elemantary Statistics

Covariance and Correlation

  • Correlation allows you to interpret the covariance further by identifying the strength of any association.

R - Elemantary Statistics

Covariance and Correlation

  • Most common of these is Pearson’s product-moment correlation coefficient. (R default)

  • The correlation coefficient estimates the nature of the linear relationship between two sets of observations

−1 ≤ ρxy ≤ 1

ρxy = 1, which is a perfect positive linear relationship

R - Elemantary Statistics

Covariance and Correlation

x = {2,4.4,3,3,2,2.2,2,4}

y = {1,4.4,1,3,2,2.2,2,7}

(mean x and y = 2.825)

(sx = 0.953 and sy = 2.013)

(rxy = 1.479)

ρxy is positive

R - Elemantary Statistics

Covariance and Correlation

R - Elemantary Statistics

Covariance and Correlation

## [1] 1.479286
## [1] 0.7713962

R - Elemantary Statistics

Covariance and Correlation

  • cov( , )
  • cor( , )

R - Elemantary Statistics

R - Elemantary Statistics

Basic Data Visualization

Basic Data Visualization

Barplots and Pie Charts

##   sta_no year month day hour minutes temp precipitation pressure
## 1  18397 2017     7  26   18       0 23.9             0   1003.0
## 2  18397 2017     7  26   18      15 23.9             0   1003.1
## 3  18397 2017     7  26   18      30 23.8             0   1003.2
## 4  18397 2017     7  26   18      45 23.8             0   1003.2
## 5  18397 2017     7  26   19       0 23.6             0   1003.2
## 6  18397 2017     7  26   19      15 23.2             0   1003.1
##   relative_humidity
## 1                94
## 2                95
## 3                96
## 4                96
## 5                96
## 6                97

Basic Data Visualization

Barplots and Pie Charts

## [1] 23.9 23.9 23.8 23.8 23.6 23.2

Basic Data Visualization

Barplots and Pie Charts

##   [1] 23.9 23.9 23.8 23.8 23.6 23.2 23.2 23.1 23.0 22.8 22.5 22.4 22.2 22.3
##  [15] 22.2 21.7 21.9 21.7 21.6 22.2 22.2 22.1 22.3 22.5 22.3 22.2 22.5 22.6
##  [29] 22.6 22.6 22.6 22.7 22.6 22.5 22.6 22.5 22.5 22.4 22.5 22.4 22.5 22.6
##  [43] 23.0 23.2 24.2 25.1 25.5 26.1 27.1 26.9 27.6 28.0 28.4 28.5 29.3 30.2
##  [57] 30.1 30.1 30.4 30.4 30.8 30.9 31.0 31.5 31.2 30.9 30.9 30.4 30.4 30.0
##  [71] 29.2 29.5 29.4 29.3 29.6 28.8 29.0 29.0 29.2 28.4 27.8 27.4 26.6 26.2
##  [85] 25.8 25.6 25.4 24.2 19.2 19.5 20.1 20.8 21.2 21.4 21.4 21.4 21.2 21.0
##  [99] 20.8 20.9 20.8 20.7 20.8 20.8 20.9 20.6 20.6 20.5 20.7 20.8 20.4 20.4
## [113] 20.6 20.5 20.4 20.5 20.5 20.6 20.5 20.5 20.4

Basic Data Visualization

Barplots and Pie Charts

## 
## 19.2 19.5 20.1 20.4 20.5 20.6 20.7 20.8 20.9   21 21.2 21.4 21.6 21.7 21.9 
##    1    1    1    4    6    4    2    6    2    1    2    3    1    2    1 
## 22.1 22.2 22.3 22.4 22.5 22.6 22.7 22.8   23 23.1 23.2 23.6 23.8 23.9 24.2 
##    1    5    3    3    8    7    1    1    2    1    3    1    2    2    2 
## 25.1 25.4 25.5 25.6 25.8 26.1 26.2 26.6 26.9 27.1 27.4 27.6 27.8   28 28.4 
##    1    1    1    1    1    1    1    1    1    1    1    1    1    1    2 
## 28.5 28.8   29 29.2 29.3 29.4 29.5 29.6   30 30.1 30.2 30.4 30.8 30.9   31 
##    1    1    2    2    2    1    1    1    1    2    1    4    1    3    1 
## 31.2 31.5 
##    1    1

Basic Data Visualization

Barplots and Pie Charts

Basic Data Visualization

Barplots and Pie Charts

Basic Data Visualization

Barplots and Pie Charts

## Warning: package 'ggplot2' was built under R version 3.5.2

Basic Data Visualization

Barplots and Pie Charts

head(station_data$precipitation)

pie(table(station_data$precipitation),labels=c("V1","V2","V3","V4","V5"),col=c("white","blue","green","orange"),main="pie chart for precipitation")

Basic Data Visualization

Histogram

Basic Data Visualization

Histogram

Basic Data Visualization

Histogram

Basic Data Visualization

Boxplot

## [1] 24.24132
## [1] 22.6
##   0%  25%  50%  75% 100% 
## 19.2 21.4 22.6 27.6 31.5

Basic Data Visualization

Histogram and Boxplot

Basic Data Visualization

Boxplot

Basic Data Visualization

Scatter Plots

Basic Data Visualization

Scatter Plots

##   temp precipitation pressure relative_humidity
## 1 23.9             0   1003.0                94
## 2 23.9             0   1003.1                95
## 3 23.8             0   1003.2                96
## 4 23.8             0   1003.2                96
## 5 23.6             0   1003.2                96
## 6 23.2             0   1003.1                97

Basic Data Visualization

R - Probability

R - Probability

A probability is a number that describes the “magnitude of chance” associated with making a particular observation or statement.

It’s always a number between 0 and 1 (inclusive) and is often expressed as a fraction.

PDF - Probability Density Function

lower < 7 < upper

X >= 2  &  X <= 7
(X[lower] - 1)/36

X > 7 & X <= 12
13 - X[upper])/36

R - Probability - Shape

  • Symmetry : Draw a vertical line down the center, and it is equally reflected with 0.5 probability.

  • Skew : If a distribution is asymmetric, look at the “tail” of a distribution. Positive or right skew indicates a tail extending longer to the right of center.

  • Modality : Modality describes the number of easily identifiable peaks in the distribution of interest. Unimodal, bimodal, and trimodal…

## 
## 19.2 19.5 20.1 20.4 20.5 20.6 20.7 20.8 20.9   21 21.2 21.4 21.6 21.7 21.9 
##    1    1    1    4    6    4    2    6    2    1    2    3    1    2    1 
## 22.1 22.2 22.3 22.4 22.5 22.6 22.7 22.8   23 23.1 23.2 23.6 23.8 23.9 24.2 
##    1    5    3    3    8    7    1    1    2    1    3    1    2    2    2 
## 25.1 25.4 25.5 25.6 25.8 26.1 26.2 26.6 26.9 27.1 27.4 27.6 27.8   28 28.4 
##    1    1    1    1    1    1    1    1    1    1    1    1    1    1    2 
## 28.5 28.8   29 29.2 29.3 29.4 29.5 29.6   30 30.1 30.2 30.4 30.8 30.9   31 
##    1    1    2    2    2    1    1    1    1    2    1    4    1    3    1 
## 31.2 31.5 
##    1    1
##    Var1 Freq
## 1  19.2    1
## 2  19.5    1
## 3  20.1    1
## 4  20.4    4
## 5  20.5    6
## 6  20.6    4
## 7  20.7    2
## 8  20.8    6
## 9  20.9    2
## 10   21    1
## 11 21.2    2
## 12 21.4    3
## 13 21.6    1
## 14 21.7    2
## 15 21.9    1
## 16 22.1    1
## 17 22.2    5
## 18 22.3    3
## 19 22.4    3
## 20 22.5    8
## 21 22.6    7
## 22 22.7    1
## 23 22.8    1
## 24   23    2
## 25 23.1    1
## 26 23.2    3
## 27 23.6    1
## 28 23.8    2
## 29 23.9    2
## 30 24.2    2
## 31 25.1    1
## 32 25.4    1
## 33 25.5    1
## 34 25.6    1
## 35 25.8    1
## 36 26.1    1
## 37 26.2    1
## 38 26.6    1
## 39 26.9    1
## 40 27.1    1
## 41 27.4    1
## 42 27.6    1
## 43 27.8    1
## 44   28    1
## 45 28.4    2
## 46 28.5    1
## 47 28.8    1
## 48   29    2
## 49 29.2    2
## 50 29.3    2
## 51 29.4    1
## 52 29.5    1
## 53 29.6    1
## 54   30    1
## 55 30.1    2
## 56 30.2    1
## 57 30.4    4
## 58 30.8    1
## 59 30.9    3
## 60   31    1
## 61 31.2    1
## 62 31.5    1

R - Common Probability Distributions

R - Common Probability Mass Functions

For discrete random variables

  • Bernoulli Distribution : has only two possible outcomes, such as success or failure.

Bernoulli Distribution

R - Common Probability Mass Functions

For discrete random variables

  • Binomial Distribution : The binomial distribution is the distribution of successes in n number of trials involving binary discrete random variables.

Binomial Distribution

There are four functions associated with Binomial distributions.

  • dbinom(x, size, prob)
  • pbinom(x, size, prob)
  • qbinom(p, size, prob)
  • rbinom(n, size, prob)
  • x is a vector of numbers.
  • p is a vector of probabilities.
  • n is number of observations.
  • size is the number of trials.
  • prob is the probability of success of each trial.

Binomial Distribution - dbinom

It is a density or distribution function.

## [1] 0.03125
## [1] 0.2734375
## [1] 0.00390625 0.03125000 0.10937500 0.21875000 0.27343750 0.21875000
## [7] 0.10937500 0.03125000 0.00390625

Binomial Distribution - dbinom

Binomial Distribution - dbinom

Binomial Distribution - dbinom

R - Common Probability Mass Functions

  • Poisson Distribution : important and rarely seen event.

λp should be interpreted as the “mean number of occurrences”

Poisson Distribution

There are three functions associated with Binomial distributions.

  • dpois(x, lambda)
  • ppois(q, lambda, lower.tail)
  • qpois(p, lambda, lower.tail)
  • rpois(n, lambda)
  • x : successes in a period
  • λ : the expected number of events
  • lower.tail = TRUE for left tail
  • q vector of quantiles
  • n number of random values to return
  • p vector of probabilities

Poisson Distribution

Next Week

  • R - Common Probability Density Functions

    • Uniform
    • Normal
    • Student’s t-distribution
    • Exponential
    • (gamma, beta, log-normal)
  • Practice : Write A Function

  • Practice : Netcdf Packages

  • QUIZ